-
Notifications
You must be signed in to change notification settings - Fork 25
feat: Add docs for audio models #573
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: lkomali <lkomali@nvidia.com>
Signed-off-by: lkomali <lkomali@nvidia.com>
Signed-off-by: lkomali <lkomali@nvidia.com>
Try out this PRQuick install: pip install --upgrade --force-reinstall git+https://github.com/ai-dynamo/aiperf.git@0fbc6b28483b70646a68da544f704aac766fb3a5Recommended with virtual environment (using uv): uv venv --python 3.12 && source .venv/bin/activate
uv pip install --upgrade --force-reinstall git+https://github.com/ai-dynamo/aiperf.git@0fbc6b28483b70646a68da544f704aac766fb3a5Last updated for commit: |
WalkthroughA new documentation tutorial is added explaining how to profile Audio Language Models using AIPerf with a vLLM-backed OpenAI-compatible chat endpoint. It covers vLLM server setup (direct and Docker), health verification, synthetic audio generation configuration options, and example CLI invocations for profiling workflows. Changes
Estimated code review effort🎯 1 (Trivial) | ⏱️ ~5 minutes Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@docs/tutorials/audio.md`:
- Around line 18-32: Update the vLLM invocation lines to use a valid
--limit-mm-per-prompt syntax: replace the invalid `--limit-mm-per-prompt
audio=2` usage in both the `vllm serve` command and the `docker run ... --model`
invocation with either JSON form `--limit-mm-per-prompt '{"audio": 2}'` or
dotted form `--limit-mm-per-prompt.audio 2`; ensure the change is applied to the
`vllm serve Qwen/Qwen2-Audio-7B-Instruct` example and the `docker run ...
vllm/vllm-openai:latest --model Qwen/Qwen2-Audio-7B-Instruct` example so the
`--limit-mm-per-prompt` flag is syntactically correct.
🧹 Nitpick comments (1)
docs/tutorials/audio.md (1)
87-97: Clarify list parameter usage in examples.The documentation describes
--audio-sample-ratesand--audio-depthsas lists to "randomly select from," but the examples (lines 62, 79) only show single values (e.g.,--audio-sample-rates 16). Consider adding a brief note or example showing how to pass multiple values, or clarify that single values are also accepted.📝 Suggested clarification
- `--audio-sample-rates`: List of sample rates in kHz to randomly select from (default: 16) + - Example: `--audio-sample-rates 16` (single value) or `--audio-sample-rates 16 24 48` (multiple values) - `--audio-depths`: List of bit depths to randomly select from (default: 16) + - Example: `--audio-depths 16` (single value) or `--audio-depths 16 24` (multiple values)
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
docs/tutorials/audio.md
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: ajcasagrande
Repo: ai-dynamo/aiperf PR: 389
File: src/aiperf/endpoints/openai_chat.py:41-46
Timestamp: 2025-10-23T03:16:02.685Z
Learning: In the aiperf project, the ChatEndpoint at src/aiperf/endpoints/openai_chat.py supports video inputs (supports_videos=True) through custom extensions, even though the standard OpenAI /v1/chat/completions API does not natively support raw video inputs.
📚 Learning: 2025-10-23T03:16:02.685Z
Learnt from: ajcasagrande
Repo: ai-dynamo/aiperf PR: 389
File: src/aiperf/endpoints/openai_chat.py:41-46
Timestamp: 2025-10-23T03:16:02.685Z
Learning: In the aiperf project, the ChatEndpoint at src/aiperf/endpoints/openai_chat.py supports video inputs (supports_videos=True) through custom extensions, even though the standard OpenAI /v1/chat/completions API does not natively support raw video inputs.
Applied to files:
docs/tutorials/audio.md
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
- GitHub Check: build (macos-latest, 3.12)
- GitHub Check: build (ubuntu-latest, 3.13)
- GitHub Check: integration-tests (ubuntu-latest, 3.13)
- GitHub Check: build (macos-latest, 3.10)
- GitHub Check: integration-tests (ubuntu-latest, 3.10)
- GitHub Check: build (ubuntu-latest, 3.11)
- GitHub Check: build (ubuntu-latest, 3.10)
- GitHub Check: build (macos-latest, 3.13)
- GitHub Check: build (ubuntu-latest, 3.12)
- GitHub Check: build (macos-latest, 3.11)
- GitHub Check: integration-tests (ubuntu-latest, 3.12)
- GitHub Check: integration-tests (ubuntu-latest, 3.11)
🔇 Additional comments (4)
docs/tutorials/audio.md (4)
1-4: LGTM!Copyright header is properly formatted with the correct year and standard SPDX identifiers.
6-11: LGTM!The introduction clearly states the purpose and scope of the tutorial.
69-85: LGTM!The example effectively demonstrates combining audio inputs with text prompts using the
--synthetic-input-tokens-meanflag.
56-67: No action required. The--endpoint-type chatfully supports audio inputs as a documented feature in AIPerf's ChatEndpoint implementation.
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
Signed-off-by: lkomali <lkomali@nvidia.com>
Signed-off-by: lkomali <lkomali@nvidia.com>
Signed-off-by: lkomali <lkomali@nvidia.com>
ajcasagrande
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks good. Now that we support loading media from files we may want to mention that.
| {"texts": ["Transcribe this audio."], "audios": ["wav,UklGRiIFAABXQVZFZm10IBAAAAABAAEAgD4AAAB9AAACABAAZGF0Yf4EAAD..."]} | ||
| {"texts": ["What is being said in this recording?"], "audios": ["mp3,SUQzBAAAAAAAI1RTU0UAAAAPAAADTGF2ZjU4Ljc2LjEwMAAAAAAAAAAA..."]} | ||
| {"texts": ["Summarize the main points from this audio."], "audios": ["wav,UklGRooGAABXQVZFZm10IBAAAAABAAEAgD4AAAB9AAACABAAZGF0YWY..."]} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the ... are just placeholders because the data is long right? might want to mention that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we actually support loading audio from file and converting to base64 automatically, may want to include that, or just change to that. though idk how that would work with the CI
Docs for profiling with audio models
Summary by CodeRabbit
Release Notes
✏️ Tip: You can customize this high-level summary in your review settings.